Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
With the growing adoption of privacy-preserving machine learning algorithms, such as Differentially Private Stochastic Gradient Descent (DP-SGD), training or fine-tuning models on private datasets has become increasingly prevalent. This shift has led to the need for models offering varying privacy guarantees and utility levels to satisfy diverse user requirements. Managing numerous versions of large models introduces significant operational challenges, including increased inference latency, higher resource consumption, and elevated costs. Model deduplication is a technique widely used by many model serving and database systems to support high-performance and low-cost inference queries and model diagnosis queries. However, none of the existing model deduplication works has considered privacy, leading to unbounded aggregation of privacy costs for certain deduplicated models and inefficiencies when applied to deduplicate DP-trained models. We formalize the problem of deduplicating DP-trained models for the first time and propose a novel privacy- and accuracy-aware deduplication mechanism to address the problem. We developed a greedy strategy to select and assign base models to target models to minimize storage and privacy costs. When deduplicating a target model, we dynamically schedule accuracy validations and apply the Sparse Vector Technique to reduce the privacy costs associated with private validation data. Compared to baselines, our approach improved the compression ratio by up to 35× for individual models (including large language models and vision transformers). We also observed up to 43× inference speedup due to the reduction of I/O operations.more » « lessFree, publicly-accessible full text available June 17, 2026
-
Free, publicly-accessible full text available May 12, 2026
-
ABSTRACT The abundance of various cell types can vary significantly among patients with varying phenotypes and even those with the same phenotype. Recent scientific advancements provide mounting evidence that other clinical variables, such as age, gender, and lifestyle habits, can also influence the abundance of certain cell types. However, current methods for integrating single-cell-level omics data with clinical variables are inadequate. In this study, we propose a regularized Bayesian Dirichlet-multinomial regression framework to investigate the relationship between single-cell RNA sequencing data and patient-level clinical data. Additionally, the model employs a novel hierarchical tree structure to identify such relationships at different cell-type levels. Our model successfully uncovers significant associations between specific cell types and clinical variables across three distinct diseases: pulmonary fibrosis, COVID-19, and non-small cell lung cancer. This integrative analysis provides biological insights and could potentially inform clinical interventions for various diseases.more » « less
-
Decision forest, including RandomForest, XGBoost, and Light-GBM, dominates the machine learning tasks over tabular data. Recently, several frameworks were developed for decision forest inference, such as ONNX, TreeLite from Amazon, TensorFlow Decision Forest from Google, HummingBirdfrom Microsoft, Nvidia FIL, and lleaves. While these frameworks are fully optimized for inference computations, they are all decoupled with databases and general data management frameworks, which leads to cross-system performance overheads. We first provided a DICT model to understand the performance gaps between decoupled and in-database inference. We further identified that for in-database inference, in addition to the popular UDF-centric representation that encapsulates the ML into one User Defined Function(UDF), there also exists a relation-centric representation that breaks down the decision forest inference into several fine-grained SQL operations. The relation-centric representation can achieve significantly better performance for large models. We optimized both implementations and conducted a comprehensive benchmark to compare these two implementations to the aforementioned decoupled inference pipelines and existing in-database inference pipelines such as Spark-SQL and PostgresML. The evaluation results validated the DICT model and demonstrated the superior performance of our in-database inference design compared to the baselines.more » « less
-
de Groot, Bert L. (Ed.)Intrinsically disordered proteins (IDPs) are highly dynamic systems that play an important role in cell signaling processes and their misfunction often causes human disease. Proper understanding of IDP function not only requires the realistic characterization of their three-dimensional conformational ensembles at atomic-level resolution but also of the time scales of interconversion between their conformational substates. Large sets of experimental data are often used in combination with molecular modeling to restrain or bias models to improve agreement with experiment. It is shown here for the N-terminal transactivation domain of p53 (p53TAD) and Pup, which are two IDPs that fold upon binding to their targets, how the latest advancements in molecular dynamics (MD) simulations methodology produces native conformational ensembles by combining replica exchange with series of microsecond MD simulations. They closely reproduce experimental data at the global conformational ensemble level, in terms of the distribution properties of the radius of gyration tensor, and at the local level, in terms of NMR properties including 15 N spin relaxation, without the need for reweighting. Further inspection revealed that 10–20% of the individual MD trajectories display the formation of secondary structures not observed in the experimental NMR data. The IDP ensembles were analyzed by graph theory to identify dominant inter-residue contact clusters and characteristic amino-acid contact propensities. These findings indicate that modern MD force fields with residue-specific backbone potentials can produce highly realistic IDP ensembles sampling a hierarchy of nano- and picosecond time scales providing new insights into their biological function.more » « less
-
In an epoch dominated by escalating concerns over climate change and looming energy crises, the imperative to design highly efficient catalysts that can facilitate the sequestration and transformation of carbon dioxide (CO2) into beneficial chemicals is paramount. This research presents the successful synthesis of nanofiber catalysts, incorporating monometallic nickel (Ni) and cobalt (Co) and their bimetallic blend, NiCo, via a facile electrospinning technique, with precise control over the Ni/Co molar ratios. Application of an array of advanced analytical methods, including SEM, TGA–DSC, FTIR-ATR, XRD, Raman, XRF, and ICP-MS, validated the effective integration and homogeneous distribution of active Ni/Co catalysts within the nanofibers. The catalytic performance of these mono- and bimetallic Ni/Co nanofiber catalysts was systematically examined under ambient pressure conditions for CO2 hydrogenation reactions. The bimetallic NiCo nanofiber catalysts, specifically with a Ni/Co molar ratio of 1:2, and thermally treated at 1050 °C, demonstrated a high CO selectivity (98.5%) and a marked increase in CO2 conversion rate—up to 16.7 times that of monometallic Ni nanofiber catalyst and 10.8 times that of the monometallic Co nanofiber catalyst. This significant enhancement in catalytic performance is attributed to the improved accessibility of active sites, minimized particle size, and the strong Ni–Co–C interactions within these nanofiber structures. These nanofiber catalysts offer a unique model system that illuminates the fundamental aspects of supported catalysis and accentuates its crucial role in addressing pressing environmental challenges.more » « less
-
Background and Aim:Copper is an essential trace metal serving as a cofactor in innate immunity, metabolism, and iron transport. We hypothesize that copper deficiency may influence survival in patients with cirrhosis through these pathways. Methods:We performed a retrospective cohort study involving 183 consecutive patients with cirrhosis or portal hypertension. Copper from blood and liver tissues was measured using inductively coupled plasma mass spectrometry. Polar metabolites were measured using nuclear magnetic resonance spectroscopy. Copper deficiency was defined by serum or plasma copper below 80 µg/dL for women or 70 µg/dL for men. Results:The prevalence of copper deficiency was 17% (N=31). Copper deficiency was associated with younger age, race, zinc and selenium deficiency, and higher infection rates (42% vs. 20%,p=0.01). Serum copper correlated positively with albumin, ceruloplasmin, hepatic copper, and negatively with IL-1β. Levels of polar metabolites involved in amino acids catabolism, mitochondrial transport of fatty acids, and gut microbial metabolism differed significantly according to copper deficiency status. During a median follow-up of 396 days, mortality was 22.6% in patients with copper deficiency compared with 10.5% in patients without. Liver transplantation rates were similar (32% vs. 30%). Cause-specific competing risk analysis showed that copper deficiency was associated with a significantly higher risk of death before transplantation after adjusting for age, sex, MELD-Na, and Karnofsky score (HR: 3.40, 95% CI, 1.18–9.82,p=0.023). Conclusions:In advanced cirrhosis, copper deficiency is relatively common and is associated with an increased infection risk, a distinctive metabolic profile, and an increased risk of death before transplantation.more » « less
An official website of the United States government

Full Text Available